Efficient text fingerprinting via Parikh mapping
نویسندگان
چکیده
We consider the problem of fingerprinting text by sets of symbols. Specifically, if S is a string, of length n, over a finite, ordered alphabet Σ , and S′ is a substring of S, then the fingerprint of S′ is the subset φ of Σ of precisely the symbols appearing in S′. In this paper we show efficient methods of answering various queries on fingerprint statistics. Our preprocessing is done in time O(n|Σ | logn log |Σ |) and enables answering the following queries: (1) Given an integer k, compute the number of distinct fingerprints of size k in time O(1). (2) Given a set φ ⊆ Σ , compute the total number of distinct occurrences in S of substrings with fingerprint φ in time O(|Σ | logn). 2003 Elsevier B.V. All rights reserved.
منابع مشابه
A Matrix q-Analogue of the Parikh Map
We introduce an extension of the Parikh mapping called the Parikh -matrix mapping, which takes its values in matrices with polynomial entries. The morphism constructed represents a word over a -letter alphabet as a -dimensional upper-triangular matrix with entries that are nonnegative integral polynomials in variable . We show that by appropriately embedding the -letter alphabet into the -lette...
متن کاملPlagiarism checker for Persian (PCP) texts using hash-based tree representative fingerprinting
With due respect to the authors’ rights, plagiarism detection, is one of the critical problems in the field of text-mining that many researchers are interested in. This issue is considered as a serious one in high academic institutions. There exist language-free tools which do not yield any reliable results since the special features of every language are ignored in them. Considering the paucit...
متن کاملQuantum fingerprints that keep secrets
We introduce a new type of cryptographic primitive that we call hiding fingerprinting. A (quantum) fingerprinting scheme translates a binary string of length n to d (qu)bits, typically d ≪ n, such that given any string y and a fingerprint of x, one can decide with high accuracy whether x = y. Classical fingerprinting schemes cannot hide information very well: a classical fingerprint of x that g...
متن کاملA q-Matrix Encoding Extending the Parikh Matrix Mapping
We introduce a generalization of the Parikh mapping called the Parikh q-matrix encoding, which takes its values in matrices with polynomial entries. The encoding represents a word w over a k-letter alphabet as a (k + 1)-dimensional upper-triangular matrix with entries that are nonnegative integral polynomials in variable q. Putting q = 1, we obtain the morphism introduced by Mateescu, Salomaa, ...
متن کاملCodi able Languages and the Parikh Matrix Mapping
We introduce a couple of families of codi able languages and investigate properties of these families as well as interrelationships between di erent families. We also develop an algorithm based on the Earley algorithm to compute the values of the inverse of the Parikh matrix mapping over a codi able context-free language. Finally, an attributed grammar that computes the values of the Parikh mat...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- J. Discrete Algorithms
دوره 1 شماره
صفحات -
تاریخ انتشار 2003